Text Classification Using Genetic Programming with Implementation of Map Reduce and Scraping
نویسندگان
چکیده
Classification of text documents on online media is a big data problem and requires automation. Text classification accuracy can decrease if there are many ambiguous terms between classes. Hadoop Map Reduce parallel processing framework for that has been widely used data. The study presented using genetic programming by pre-processing map-reduce collecting web scraping. Genetic to perform association rule mining (ARM) before analyze patterns. articles from science-direct with the three keywords. This aims ARM-based pattern analysis collection system through web-scraping, map-reduce, programming. Through scraping, collected reducing duplicates as much 17718. Map-reduce tokenized stopped-word removal 36639 5189 unique 31450 common terms. Evaluation ARM different amounts multi-tree produce more longer rules better support. also produces specific performance than single tree. evaluation shows tree (0.7042) decision (0.6892), lowest multi-tree(0.6754). results not in line results, where best result (0.3904) (0.3588), (0.356).
منابع مشابه
automatic verification of authentication protocols using genetic programming
implicit and unobserved errors and vulnerabilities issues usually arise in cryptographic protocols and especially in authentication protocols. this may enable an attacker to make serious damages to the desired system, such as having the access to or changing secret documents, interfering in bank transactions, having access to users’ accounts, or may be having the control all over the syste...
15 صفحه اولEvolving Text Classification Rules with Genetic Programming
We describe a novel method for using Genetic Programming to create compact classification rules using combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classifica...
متن کاملDimensionality Reduction and Improving the Performance of Automatic Modulation Classification using Genetic Programming (RESEARCH NOTE)
This paper shows how we can make advantage of using genetic programming in selection of suitable features for automatic modulation recognition. Automatic modulation recognition is one of the essential components of modern receivers. In this regard, selection of suitable features may significantly affect the performance of the process. Simulations were conducted with 5db and 10db SNRs. Test and ...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملMap Reduce Text Clustering Using Vector Space Model
Information retrieval is the area of finding particular web pages via a query to an internet search engine. Even though well sophisticated algorithms and data structures are used in traditional computer techniques to create indexes for efficiently organize and retrieve information systems, currently data mining techniques like clustering are used to enhance the efficiency of retrieval process. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: JOIV : International Journal on Informatics Visualization
سال: 2023
ISSN: ['2549-9610', '2549-9904']
DOI: https://doi.org/10.30630/joiv.7.2.1813